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ABSTRACT 

We examine the line-of-sight clustering of QSO heavy-element absorption-line 
systems, using a new measure of clustering, called the reduced second moment measure, 
K(r), that directly measures the mean over-density of absorbers on scales ^ r. This 
measure — while closely related to other second-order measures such as the correlation 
function or the power spectrum — has a number of distinct statistical properties which 
make possible a continuous exploration of clustering as a function of scale. From a 
sample of 352 C IV absorbers with median redshift (z) = 2.2, drawn from the spectra 
of 274 QSOs, we find that the absorbers are strongly clustered on scales from 1 to 
20 h Mpc. Furthermore, there appears to be a sharp break at 20 h~ l Mpc, with 
significant clustering on scales up to 100 h~ l Mpc in excess of that which would be 
expected from a smooth transition to homogeneity. There is no evidence of clustering 
on scales greater than 100 h Mpc. These results suggest that strong C iv absorbers 
along a line of sight are indicators of clusters and possibly superclusters, a relationship 
that is supported by recent observations of “Lyman break” galaxies. 


Subject headings: cosmology: observations — intergalactic medium — large-scale 
structure of universe — methods: statistical — quasars: absorption lines 


INTRODUCTION 


In a previous series of investigations ( Vanden Berk et al. 1996| ; Quashnock, Vanden Berk, & 
York 1996; Quashnock & Vanden Berk 1998), the clustering properties of C IV and Mg II absorbers 
have been investigated, using an extensive catalog of heavy-element absorption-line systems 
drawn from the literature. [] These authors used a line-of-sight correlation function analysis and 


1 Contact D. E. Vanden Berk (danvb@astro.as.utexas.edu) for a preliminary version of the catalog; see York et al. 
(1991) for an earlier version. 
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found evidence for strong (and evolving) power-law clustering on comoving scales of 1 to 16 
/i -1 Mpc of a form that is consistent with that found for galaxies and clusters at low redshift, and 
of amplitude such that absorbers are correlated on scales of clusters of galaxies. Furthermore, 
there also appears to be superclustering on scales of 50 to 100 h~ l Mpc (Quashnock et al. 1996), 
suggesting that these absorbers are biased tracers of the higher-density regions of space, and that 
agglomerations of strong absorbers along a line of sight are indicators of clusters and superclusters. 


This relationship is supported by recent observations of so-called “Lyman break” galaxies 
( Steidel et al. 1998|) that were found to be concentrated in coherent structures of size ~ 10 
h~ l Mpc. These structures were found to contain metal-line systems. Also, the amplitude 
of the correlation function of these Lyman break galaxies at z = 3.04 (ro=2.1 ± 0.7 /i _1 Mpc 
[go = 0.5]; |Giavalisco et al. 1998|) is consistent with that found for C IV absorbers (r 0 =2.2 h~ l Mpc; 
IQuashnock fe Vanden Berk 199§| ). While the exact relationship between high-redshift galaxies 
and heavy-element absorbers is unclear, it does appear that these systems are tracing the richer 
agglomerations of the clustering network, perhaps one that is similar to that found in detailed 
three-dimensional numerical investigations of the distribution of the richest Lyce absorbers (see, 
e.g., |Zhang et al. 1998|). 


Thus it is of great interest to measure and characterize the clustering of the absorbers, 
over as broad a range in scale as possible and with special attention given to the largest scales, 
using the best statistical tools that are at hand. Quashnock et al. (1996) were unable to relate 
the superclustering found on very large (~ 100 h~ l Mpc) scales with the power-law clustering 
found later on smaller scales ( IQuashnock Sz Vanden Berk 1998 ), nor locate the approximate scale 
dividing these two regimes, because they used a two-point correlation function analysis requiring 
bins too large (25 h -1 Mpc) for this purpose. 

Here, we examine the line-of-sight clustering of QSO heavy-element absorption-line systems, 
using a new measure of clustering, called the reduced second moment measure, K (r), that directly 
measures the mean over-density of absorbers on scales ^ r. This measure — while closely related 
to other second-order measures such as the correlation function or the power spectrum — has a 
number of distinct statistical properties which make possible a continuous exploration of clustering 
as a function of scale. It has been well-studied by statisticians ( Ripley 1988 ; Baddeley 1998|) and 


recently astrophysicists (Martmez et al. 1998), and several estimators have been developed for it. 


The absorber catalog, with a total of over 2200 absorbers listed over 500 QSOs, permits 
exploration of clustering over a large range in scale (from about 1 to over 100 h~ l Mpc) and 
redshift (z from about 1 to 4). Ultimately, we are interested in a three-dimensional description of 
the absorber distribution; nevertheless, much of the useful information about this distribution lies 
in the one-dimensional distribution of the absorption-line systems along the lines of sight to QSOs 
(see, e.g., Crotts et al. 1985). The large number of such lines of sight makes it possible to make 
some inferences about three-dimensional clustering from one-dimensional statistical measures. 


The outline of the paper is as follows: In § 2 we define the reduced second moment measure, 
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present the estimator we have used for it, and discuss its statistical properties. In § 3 we present 
our results for the reduced second moment measure, using a large sample of C iv absorbers with 
median redshift (z) = 2.2. In § 4 we discuss the implications of these results on our picture of 
absorber clustering. 


2. THE REDUCED SECOND MOMENT MEASURE 

Here we assume that the clustering of absorbers is stationary (does not depend on time) 
and homogeneous (does not depend on direction or location). The first assumption is likely not 
to be strictly true, since growth of the correlation with decreasing redshift has been detected 
( Quashnock fc Vanden Berk 19981) . Thus our results here are averages for our sample, which has a 
characteristic redshift given by the median ( z) = 2.2. We follow the usual convention and take the 
Hubble constant, Hq, to be 100 h km s -1 Mpc -1 and take qo = 0.5 and A = 0. 


2.1. Definitions 


Consider the process of absorber locations along some line of sight, and let J\f be the mean 
number of absorbers per unit comoving length. Define the reduced second moment measure , K(r), 
as the conditional expectation, or average — given that there is an absorber at x* — of the number 
of absorbers (other than the one at x* itself), N(xi,r), that are within a comoving distance r of 
Xi, normalized by AT: 


K(r) 


— E [N(xi, r) 


absorber at x/\ . 


(1) 


Because of our assumption of homogeneity, the expected number of absorbers in equation (1) does 
not depend on x,;. With qo = 0.5 and A = 0, the comoving distance r between two absorbers at 
redshifts Zj and z 3 is r = 2c/ Hq x 11/^/1 + Zi — 1/^/1 + Zj\. 


In terms of the two-point correlation function £(r) ([Peebles 1980 
moment measure is given by 


1993), the reduced second 


K(r) =2 f V du (1+ £(«)) . (2) 

Jo 

If no correlations are present, then K(r ) = 2r. Simply put, in this case the number of surrounding 
absorbers within distance r of Xi would not depend on the fact that there is an absorber at 
Xi, and would simply be equal to 2rM. (The factor 2 arises because we consider distinct 
absorbers within a distance r, or in the interval (x* — r, x* + r) around any given absorber.) 
The quantity K(r)/2r = 1 + p(r) is then a measure of the relative mean density of absorbers 
around other absorbers, averaged over scales less than r. The relative mean over-density, p[r), 
can be written in terms of the power spectrum, P(k), the Fourier transform of the correlation 
function £(r), or equivalently, in terms of the dimensionless power per logarithmic wavenumber, 
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A 2 {k) = k 3 P(k)/2iT 2 : 

jo re kr 

Here Si(z) = / 0 Z dt sm(t)/t is the sine-integral. 

Thus the reduced second moment measure, K(r), is closely related to other second-order 
measures such as the correlation function or the power spectrum, and it directly measures the 
mean over-density of absorbers on scales less than r. However, it has a number of distinct and 
desirable statistical properties which we examine below in § 2.3. 


2.2. Estimating K(r) 


Let Tj be the comoving length of the ith line of sight, i.e., the section of the zth QSO 
spectrum which has been effectively searched for absorbers. In Figure 1, we show the cumulative 
distribution of the comoving lengths of 274 QSO lines of sight (over an approximate redshift range 
1.2 < z < 3.2) in the Vanden Berk et al. catalog. Almost all of the lengths are shorter than 400 
h~ l Mpc, but the median length ( T) = 350 h Mpc, meaning that there is information on the 
clustering of the absorbers on scales of 100 h~ 1 Mpc or more. 

Let rii be the number of absorbers found in the zth line of sight at positions xn ,..., X { ni . If 
there are a total of m lines of sight, then the total comoving length and number of absorbers are 
T = YaLi Ti and n = YaL\ respectively. An estimate for the mean number of absorbers per 
unit comoving length is JV = n/T. 

From equation (1), a natural estimate of the reduced second moment measure, K(r), is 


K(r) 


T 

n(n — 1) 


m rii 

Y,Y2 d ( r ~ N - x v'\) > 


i=1 J Af 


(4) 


where 0{x) is the Heaviside step function. This estimate sums over pairs of absorbers that are on 
the same line of sight and within distance r of each other. 

However, this estimator is biased low, because neighboring absorbers that he outside the 
line of sight cannot be counted. One way to remove the bias due to edge effects is to use the 
rigid motion corrected estimator ( [Miles 1974 | ; |Osher &; Stoyan 1981 | ), which corrects for these edge 
effects by weighting the summand in equation (4) by a factor f{\xij — Xiy |) which depends on the 
separation \xjj — Xiy \ relative to the lengths Tj of the lines of sight .^| This factor is the probability, 
given that there is a first absorber somewhere on some line of sight, that a second absorber of 
fixed separation from the first would also be contained within the same line if sight. We find that 


2 Other estimators, which may have lower variance, have been found by Stein (1993), but we defer a discussion 

and treatment of these to a later work. 
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this probability is f(\xij — x^y \) = — \ x ij ~ x ij'\) + /T (where the + superscript indicates 

that a summand is included in the sum only if it is positive), so that the edge-corrected estimator 
we use is 


K(r) 


® ~ \ Xij ~ Xij '^ 


/( \ Xij -x if \ 


(5) 


2.3. Statistical Properties 


This estimator has the following statistical property: E[n(n — 1)A"(?’)/T 2 ] = J\f 2 K{r) exactly 
under any homogeneous and isotropic model for the absorbers. Furthermore, while K{r) is not an 
exactly unbiased estimator for K(r ), it is a consistent estimator for K(r ) in the sense that K(r ) 
tends to K[r ) in probability as m increases. 

Let us contrast this estimator with the quantity £ aa (A r) used in Quashnock et al. (1996) to 
measure clustering. For an interval Ar = (n,r 2 ), £ aa (Ar) is the number of pairs of absorbers 
whose separation is in the interval Ar divided by the number of pairs that would be expected in 
Ar if the n absorbers were randomly distributed, minus 1. This statistic has the desirable property 
that it tends to 0 as m increases if £(r) is identically 0 on Ar. Furthermore, positive values of 
£ aa (Ar) indicate clustering over the range of distances in Ar. However, it does not provide an 
appropriate estimate of f'^£,{r)dr. In particular, E[£ aa (Ar)\ does not tend to S^i{r)dr as m 
increases. For example, if all lines of sight were of equal length T\, it is possible to show that as 

°° ’ k 2 (Ti-uk{u)du 

£,aa(Ar) —> ^ / --- YT3. - m probability. (6) 


Ti(r 2 -ri) - 

The fact that this limit generally depends on T\ is undesirable for purposes of obtaining a 
quantitative assessment of the clustering of absorbers. When £ is nearly constant on Ar, the limit 
in equation (6) is approximately £((n +r 2 )/2), as one would hope. Unfortunately, the moderate 
size of this data set requires the use of rather wide bins, and Quashnock et al. (1996) use values of 
r 2 — ri of 25 /i -1 Mpc and greater. Using the relationship between £ and K in equation (2), we can 
easily obtain a consistent estimator of J^ 2 £(r) dr as m increases. Specifically, [/\ (r 2 ) — K(r{)\/2 
converges in probability to / r r2 £(r) dr as m increases. 

By examining K[r ) as a function of r, we can make a continuous exploration of clustering 
as a function of scale, without the binning required when using a correlation function analysis. 

In particular, this permits a more detailed examination of the relationship between small-scale 
clustering ( IQuashnock &; Vanden Berk 1998 ) and the superclustering found by Quashnock et 
al. (1996). The reduced second moment measure estimator (eq. [5]) is easy to compute and has 
well-understood statistical properties. 








3. RESULTS 


We have used equation (5) to estimate the reduced second moment measure, K(r), for 274 
QSO lines of sight, obtained from the Vanden Berk et al. catalog. A total of 352 C iv absorbers 
have been selected from this heterogeneous catalog, using selection criteria (Quashnock et al. 1996; 
Quashnock fc Vanden Berk 1998| ) designed to obtain as homogeneous a data set as possible. We 
refer the reader to these papers for a detailed description of the selection criteria. 

In Figure 2, we show our results for the quantity K(r)/2r = 1 + p{r) (solid line) for this 
sample. This quantity has expectation value unity, if there is no clustering of absorbers along lines 
of sight (see eq. [2]). We have constructed 1000 data sets of 352 absorbers uniformly distributed 
along the 274 QSO lines of sight, these lines having the same distribution of comoving lengths 
as in our actual data sample (see Fig. 1). The 95% region of variation of K(r)/2r for these 
1000 simulated data sets, about the expectation value of unity, is also shown in Figure 2 (dashed 
lines). Our estimated value of K(r)/2r for the C iv absorber data set is much greater than the 
upper limit of this band, for values of r between 1 and 20 h -1 Mpc. For example, for r = 10 
h~ l Mpc our estimate is more than 12cr above unity, meaning that a simulated data set with 
uniformly distributed absorbers would essentially never have a p as large as is measured. Thus 
C IV absorbers cluster significantly on these scales. 

We compare these results for K{r)/2r with those of Quashnock Sz Vanden Berk (1998) 

— who found that the correlation function of C IV absorbers on scales of 1 to 16 h~ l Mpc is 
consistent with a power law of the form £(r) = (ro/r) 7 , with ?’o = 3.4 h~ l Mpc and 7 = 1.75 - 
by substituting this form of the correlation function into equation (2). In Figure 2 (light line), 
we show the value of K(r)/2r if absorbers have this power-law correlation function]]] This form 
of clustering appears to describe the estimated reduced second moment measure K(r) reasonably 
well, out to scales r ~ 20 h~ 1 Mpc; afterwards, there appears to be a break in the form of K(r)/2r. 


We have investigated the significance of this excess by examining the quantity 
[. K(r ) — A'(20)]/[2(r — 20)] = [/ 20 £(u) du]/(r — 20) = A 2 o(r), shown in panel a of Figure 3 (solid 
line) for the same sample of C IV absorbers as in Figure 2. From equation (2), A 2 o(r) also has 
expectation value of unity, if the correlation function is zero on scales greater than 20 /i _1 Mpc. 
From Figure 3, it appears that A 2 o(r) is greater than unity on scales r ^ 30 h Mpc. We 
have estimated the error in the estimate of A 2 o(r) by a bootstrap resampling method in which 
we randomly pick 274 QSO lines of sight from the actual data sample, with replacement, i.e., 
allowing for the same line of sight to be picked multiple times (see Efron & Tibshirani 1993, or 


Davison Sz Hinkley 1997, for a review of bootstrap methods for estimating errors). This method 
ensures that the distribution of lengths of the resampled data sets is the same as that of the actual 


3 Since we have no information about £(r) on scales smaller than 1 h _1 Mpc (Quashnock & Vanden Berk 1998), 
eq. [2] implies that K(r) in this power-law case is determined only to within an additive constant. Here, we have 
fixed K(r) to its measured value at r = 5 h -1 Mpc. 


(Quashnock & Vanden Berk 1998 
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sample (Fig. 1). In panel a of Figure 3 (dashed lines), we also show the bootstrap-estimated 95% 
pointwise (i.e., for each value of r) confidence region for A 2 o(r). While there is some uncertainty in 
the estimate of this quantity, it does appear that there is significant excess clustering on scales r ;> 
30 hr 1 Mpc. For example, when resampling data sets by the bootstrap method, A2o(50 hr 1 Mpc) 
is greater than unity 99.998% of the time. 

The bootstrap procedure for obtaining confidence intervals we have employed here has the 
desirable property that its validity does not require any special assumptions about the nature of 
the absorber distribution along a line of sight. More specifically, because it uses lines of sight as the 
sampling unit in the resampling scheme, it only requires that the location of absorbers on different 
lines of sight are independent. Since the majority of lines of sight are not within 100 h~ l Mpc 
of any other line of sight, this independence assumption is reasonable. By using a resampling in 
which groups of lines of sight are resampled rather than individual lines, we believe it should be 
possible to detect if this independence assumption is appropriate. We plan to investigate this 
possibility and other refinements of the bootstrapping procedure in future work. 

The procedures used by Quashnock & Vanden Berk (1998) and Quashnock et al. (1996) also 
assume that absorber locations on different lines of sight are independent. In addition, they both 
make use of further approximations about the absorber location process within a line of sight. 
Quashnock & Vanden Berk (1998) obtain approximate confidence intervals for the line-of-sight 
correlation function up to distances of 16 h~ l Mpc by assuming that every pair of points whose 
separation is in the interval Ar is independent of every other such pair. This assumption may be a 
good approximation when ?’2 — ?’i is small, although simulations in Stoyan, Bertram, & Wendrock 
(1993) suggest that such an assumption may often lead to overoptimistic confidence intervals. 

On larger scales, for which Quashnock et al. (1996) have used fairly wide bins (greater than 25 
h~ l Mpc wide), assuming independence between pairs with distance in A r may be problematic. 
There, the confidence intervals are based on assuming that one can ignore correlations beyond 
second-order in absorber locations along a line of sight. While we have no evidence that such 
an assumption is wrong, the bootstrapping procedure we employ is valid whether or not this 
assumption is reasonable. 

We have also searched for clustering on scales greater than 50, 100, and 150 /i -1 Mpc, by 
examining the quantities Aso(r), Aioo(?’), and Aiso(r), shown (solid lines) in panels b, c, and d, 
respectively, of Figure 3. Again we also show the bootstrap-estimated 95% confidence region for 
each quantity (dashed lines). We find that Aso(r) is significantly greater than unity, for scales r > 
50 /i -1 Mpc, meaning that there is significant clustering on those scales. Namely, when resampling 
data sets by the bootstrap method, Aso(100 h -1 Mpc) is greater than unity 99.91% of the time. 
However, Aiso(r) is statistically consistent with unity for all r > 150 hr 1 Mpc, and Aioo(r) is 
consistent with unity everywhere except (marginally) for r ~ 200 h~ l Mpc. This supports the 
conclusion of Quashnock et al. (1996) that at present there is no significant evidence for clustering 
of absorbers on scales greater than 100 h^ 1 Mpc. 



4. DISCUSSION 


We have demonstrated that the line-of-sight clustering of QSO heavy-element absorption-line 
systems can be examined using a new measure of clustering, called the reduced second moment 
measure, K(r), that directly measures the mean over-density of absorbers on scales ^ r. By 
estimating K(r), we find that the absorbers are strongly clustered on scales from 1 to 20 h~ l Mpc, 
in a manner that is consistent with a power-law correlation function of the form found by 
Quashnock & Vanden Berk (1998). The form and amplitude of this clustering strongly suggests 
that the absorbers are tracing the large-scale structure seen in the distribution of galaxies and 
clusters. 

However, because we have only examined the clustering of absorbers in one dimension, along 
the line of sight, there remains the possibility that some or all of the excess clumping is due 
to velocity effects, i.e., groups of component absorbers spread out in redshift due to velocity 
dispersion. (Note that at redshift (z) = 2.2, 1 /i _1 Mpc corresponds to velocity differences Av = 
180 km s^ 1 in the rest frame: The flattening in 1 + p(r) seen near 1 h Mpc in Figure 2 may be 
due to velocity dispersion, as well as limited spectral resolution.) This has been argued by Crotts, 
Buries, & Tytler (1997), who explore the spatial clustering of C iv systems along adjacent lines 
of sight, and claim that it is significantly weaker than clustering along a line of sight. Quashnock 
&; Vanden Berk (1998) have shown that, whether due to peculiar motions inside clusters, or to 
actual spatial clustering on megaparsec scales, that the scale, the form, and the amplitude of the 
clustering are all indicative of an association of strong absorbers with clusters. Such an association 
is also supported by observations of “Lyman break” galaxies (see § 1). 

In Figure 2, there is a sharp break in the form of I\(r) at 20 h~ l Mpc. It thus appears that 
(for qo = 0.5) this is the scale marking the boundary of power-law clustering on smaller scales. 
Using the reduced second moment measure has permitted an approximate determination of this 
break. From Figure 3 (panels a-d) there is evidence for clustering on scales of up to 100 h~ x Mpc 
- but not on larger scales — in excess of that which would be expected from a smooth transition 
to homogeneity. 

One possible interpretation of this excess is that it is due to superclustering on scales of 50 
to 100 /i -1 Mpc ([Heisler, Hogan, fe White 1989| ; [Dinshaw fe Impey 1996| ; [Williger et al. 1911(1 ; 
Quashnock et al. 1996, and references therein), much like what is seen locally in the distribution of 
galaxies: If true, this means that these absorbers are biased tracers of the higher-density regions 
of space, and that agglomerations of strong absorbers along a line of sight are indicators of clusters 
and superclusters. 

However, Richards et al. (1999) have recently claimed that there may be evidence in the 
data catalogs that the number of C IV absorbers along the line of sight depends on the intrinsic 
properties of the QSO. These authors argue that there may be a significant contamination of true 
intervening systems along the line of sight by absorbers that are actually associated with the QSO, 
and that such a contamination may extend to relative velocities as great as 75000 km s~ x from the 


(Heisler, Hogan, fe White 1989|; Dinshaw fe Impey 1996|; Williger et al. 1991 
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QSO. In this work, we have adopted the standard cutoff and excluded absorbers that are closer 
than 5000 km s -1 to the QSO ( Foltz et al. 1988 ; this corresponds to comoving distances of about 
30 /i _1 Mpc in this work). 


It is possible that such a contamination is present in the large-scale excess p(r) in Figure 2. 
A more detailed analysis of this possible effect will require an indicator capable of distinguishing, 
at least statistically, associated absorption-line systems from true intervening ones. While the 
exact nature of this large-scale excess is still uncertain, its existence on scales of 20 h _1 Mpc has 
been unambiguously revealed by the present analysis. 


This work was supported in part by NASA grant NAG 5-4406 and NSF grant DMS 97-09696 
(J. M. Q.), and by NSF grant DMS 95-04470 (M. L. S.). 
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Fig. 1.— Cumulative distribution of the comoving lengths of the 274 QSO lines of sight, containing 
a total of 352 C iv absorbers obtained from the Vanden Berk et al. catalog. The median length 
(T) = 350 h~ l Mpc (go = 0.5). 
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Fig. 2.— Estimate of the reduced second moment measure, K(r), divided by 2 r (solid line), for the 
274 QSO lines of sight, containing a total of 352 C IV absorbers obtained from the Vanden Berk 
et al. catalog. For comparison, we show K(r)/2r for absorbers that have a power-law correlation 
function, £(?’) = (?’o/r) 7 , with ro = 3.4 h~ l Mpc and 7 = 1.75 (light line ; see text). Also shown 
is the 95% region of variation of K(r)/2r for 1000 simulated data sets of unclustered absorbers, 
about the expectation value of unity (dashed lines). 







Fig. 3.— Clustering measure (solid lines) on scales greater than: (a) 20 h Mpc, A 2 o(r); (b) 50 
hr 1 Mpc, A, 5 o(r); (c) 100 h Mpc, Aioo(V); (d) 150 h~ l Mpc, Aiso(r); for the same sample of C iv 
absorbers as in Fig. 2. Also shown are the bootstrap-estimated 95% pointwise confidence regions 
for each quantity (dashed lines', see text). 










